Overview
Dataset statistics
| Number of variables | 17 |
|---|---|
| Number of observations | 22699 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 6.5 MiB |
| Average record size in memory | 302.6 B |
Variable types
| Categorical | 4 |
|---|---|
| DateTime | 2 |
| Numeric | 10 |
| Boolean | 1 |
RatecodeID is highly overall correlated with tolls_amount | High correlation |
extra is highly overall correlated with improvement_surcharge and 1 other fields | High correlation |
fare_amount is highly overall correlated with total_amount and 1 other fields | High correlation |
improvement_surcharge is highly overall correlated with extra and 1 other fields | High correlation |
mta_tax is highly overall correlated with extra and 1 other fields | High correlation |
tip_amount is highly overall correlated with total_amount | High correlation |
tolls_amount is highly overall correlated with RatecodeID | High correlation |
total_amount is highly overall correlated with fare_amount and 2 other fields | High correlation |
trip_distance is highly overall correlated with fare_amount and 1 other fields | High correlation |
store_and_fwd_flag is highly imbalanced (96.0%) | Imbalance |
payment_type is highly imbalanced (51.5%) | Imbalance |
mta_tax is highly imbalanced (97.2%) | Imbalance |
improvement_surcharge is highly imbalanced (99.3%) | Imbalance |
RatecodeID is highly skewed (γ1 = 117.1412088) | Skewed |
fare_amount is highly skewed (γ1 = 21.66310069) | Skewed |
total_amount is highly skewed (γ1 = 20.38940334) | Skewed |
extra has 11921 (52.5%) zeros | Zeros |
tip_amount has 8057 (35.5%) zeros | Zeros |
tolls_amount has 21525 (94.8%) zeros | Zeros |
Reproduction
| Analysis started | 2025-12-03 20:37:10.866795 |
|---|---|
| Analysis finished | 2025-12-03 20:37:24.474008 |
| Duration | 13.61 seconds |
| Software version | ydata-profiling vv4.18.0 |
| Download configuration | config.json |
Variables
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 2 |
| 5th row | 2 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 12626 | |
| 1 | 10073 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2 | 12626 | |
| 1 | 10073 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 12626 | |
| 1 | 10073 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 22699 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 12626 | |
| 1 | 10073 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 22699 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 12626 | |
| 1 | 10073 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 22699 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 12626 | |
| 1 | 10073 |
tpep_pickup_datetime
Date
| Distinct | 22687 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 177.5 KiB |
| Minimum | 2017-01-01 00:08:25 |
|---|---|
| Maximum | 2017-12-31 23:45:30 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
| Distinct | 22688 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 177.5 KiB |
| Minimum | 2017-01-01 00:17:20 |
|---|---|
| Maximum | 2017-12-31 23:49:24 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
passenger_count
Real number (ℝ)
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.642319 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 33 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.2852311 |
|---|---|
| Coefficient of variation (CV) | 0.78257092 |
| Kurtosis | 3.7105074 |
| Mean | 1.642319 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.172872 |
| Sum | 37279 |
| Variance | 1.651819 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 16117 | |
| 2 | 3305 | 14.6% |
| 5 | 1143 | 5.0% |
| 3 | 953 | 4.2% |
| 6 | 693 | 3.1% |
| 4 | 455 | 2.0% |
| 0 | 33 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 33 | 0.1% |
| 1 | 16117 | |
| 2 | 3305 | 14.6% |
| 3 | 953 | 4.2% |
| 4 | 455 | 2.0% |
| 5 | 1143 | 5.0% |
| 6 | 693 | 3.1% |
| Value | Count | Frequency (%) |
| 6 | 693 | 3.1% |
| 5 | 1143 | 5.0% |
| 4 | 455 | 2.0% |
| 3 | 953 | 4.2% |
| 2 | 3305 | 14.6% |
| 1 | 16117 | |
| 0 | 33 | 0.1% |
trip_distance
Real number (ℝ)
High correlation
| Distinct | 1545 |
|---|---|
| Distinct (%) | 6.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.9133129 |
| Minimum | 0 |
|---|---|
| Maximum | 33.96 |
| Zeros | 148 |
| Zeros (%) | 0.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.49 |
| Q1 | 0.99 |
| median | 1.61 |
| Q3 | 3.06 |
| 95-th percentile | 10.531 |
| Maximum | 33.96 |
| Range | 33.96 |
| Interquartile range (IQR) | 2.07 |
Descriptive statistics
| Standard deviation | 3.6531712 |
|---|---|
| Coefficient of variation (CV) | 1.2539577 |
| Kurtosis | 10.4106 |
| Mean | 2.9133129 |
| Median Absolute Deviation (MAD) | 0.81 |
| Skewness | 2.9949129 |
| Sum | 66129.29 |
| Variance | 13.34566 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 531 | 2.3% |
| 0.9 | 507 | 2.2% |
| 0.8 | 497 | 2.2% |
| 1.1 | 490 | 2.2% |
| 0.7 | 469 | 2.1% |
| 1.2 | 451 | 2.0% |
| 0.6 | 427 | 1.9% |
| 1.3 | 423 | 1.9% |
| 1.4 | 392 | 1.7% |
| 1.5 | 367 | 1.6% |
| Other values (1535) | 18145 |
| Value | Count | Frequency (%) |
| 0 | 148 | |
| 0.01 | 7 | < 0.1% |
| 0.02 | 11 | < 0.1% |
| 0.03 | 4 | < 0.1% |
| 0.04 | 4 | < 0.1% |
| 0.05 | 1 | < 0.1% |
| 0.06 | 3 | < 0.1% |
| 0.07 | 5 | < 0.1% |
| 0.08 | 3 | < 0.1% |
| 0.09 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 33.96 | 1 | |
| 33.92 | 1 | |
| 32.72 | 1 | |
| 31.95 | 1 | |
| 30.83 | 1 | |
| 30.5 | 1 | |
| 30.33 | 1 | |
| 28.23 | 1 | |
| 28.2 | 1 | |
| 27.97 | 1 |
RatecodeID
Real number (ℝ)
High correlation Skewed
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.043394 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.70839088 |
|---|---|
| Coefficient of variation (CV) | 0.67892943 |
| Kurtosis | 16112.97 |
| Mean | 1.043394 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 117.14121 |
| Sum | 23684 |
| Variance | 0.50181765 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 22070 | |
| 2 | 513 | 2.3% |
| 5 | 68 | 0.3% |
| 3 | 39 | 0.2% |
| 4 | 8 | < 0.1% |
| 99 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 22070 | |
| 2 | 513 | 2.3% |
| 3 | 39 | 0.2% |
| 4 | 8 | < 0.1% |
| 5 | 68 | 0.3% |
| 99 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 99 | 1 | < 0.1% |
| 5 | 68 | 0.3% |
| 4 | 8 | < 0.1% |
| 3 | 39 | 0.2% |
| 2 | 513 | 2.3% |
| 1 | 22070 |
store_and_fwd_flag
Boolean
Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.3 KiB |
| False | |
|---|---|
| True | 99 |
| Value | Count | Frequency (%) |
| False | 22600 | |
| True | 99 | 0.4% |
PULocationID
Real number (ℝ)
| Distinct | 152 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 162.41235 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 114 |
| median | 162 |
| Q3 | 233 |
| 95-th percentile | 261 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 119 |
Descriptive statistics
| Standard deviation | 66.633373 |
|---|---|
| Coefficient of variation (CV) | 0.41027282 |
| Kurtosis | -0.89920699 |
| Mean | 162.41235 |
| Median Absolute Deviation (MAD) | 67 |
| Skewness | -0.25777668 |
| Sum | 3686598 |
| Variance | 4440.0064 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 237 | 890 | 3.9% |
| 161 | 861 | 3.8% |
| 186 | 792 | 3.5% |
| 236 | 785 | 3.5% |
| 162 | 779 | 3.4% |
| 234 | 749 | 3.3% |
| 170 | 749 | 3.3% |
| 48 | 741 | 3.3% |
| 230 | 739 | 3.3% |
| 142 | 649 | 2.9% |
| Other values (142) | 14965 |
| Value | Count | Frequency (%) |
| 1 | 3 | < 0.1% |
| 4 | 60 | 0.3% |
| 7 | 37 | 0.2% |
| 10 | 1 | < 0.1% |
| 12 | 9 | < 0.1% |
| 13 | 227 | |
| 14 | 2 | < 0.1% |
| 17 | 6 | < 0.1% |
| 24 | 62 | 0.3% |
| 25 | 25 | 0.1% |
| Value | Count | Frequency (%) |
| 265 | 14 | 0.1% |
| 264 | 345 | |
| 263 | 392 | |
| 262 | 259 | |
| 261 | 130 | 0.6% |
| 260 | 21 | 0.1% |
| 258 | 1 | < 0.1% |
| 256 | 14 | 0.1% |
| 255 | 33 | 0.1% |
| 249 | 483 |
DOLocationID
Real number (ℝ)
| Distinct | 216 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 161.528 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 43 |
| Q1 | 112 |
| median | 162 |
| Q3 | 233 |
| 95-th percentile | 257.1 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 121 |
Descriptive statistics
| Standard deviation | 70.139691 |
|---|---|
| Coefficient of variation (CV) | 0.43422622 |
| Kurtosis | -0.94501782 |
| Mean | 161.528 |
| Median Absolute Deviation (MAD) | 68 |
| Skewness | -0.32848288 |
| Sum | 3666524 |
| Variance | 4919.5762 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 161 | 858 | 3.8% |
| 236 | 802 | 3.5% |
| 230 | 761 | 3.4% |
| 237 | 759 | 3.3% |
| 170 | 699 | 3.1% |
| 162 | 681 | 3.0% |
| 234 | 661 | 2.9% |
| 186 | 653 | 2.9% |
| 48 | 619 | 2.7% |
| 142 | 612 | 2.7% |
| Other values (206) | 15594 |
| Value | Count | Frequency (%) |
| 1 | 34 | 0.1% |
| 4 | 101 | |
| 7 | 89 | 0.4% |
| 9 | 2 | < 0.1% |
| 10 | 6 | < 0.1% |
| 11 | 2 | < 0.1% |
| 12 | 24 | 0.1% |
| 13 | 230 | |
| 14 | 19 | 0.1% |
| 15 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 60 | 0.3% |
| 264 | 304 | |
| 263 | 369 | |
| 262 | 261 | |
| 261 | 117 | 0.5% |
| 260 | 19 | 0.1% |
| 259 | 3 | < 0.1% |
| 258 | 2 | < 0.1% |
| 257 | 17 | 0.1% |
| 256 | 55 | 0.2% |
payment_type
Categorical
Imbalance
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 MiB |
| 1 | |
|---|---|
| 2 | |
| 3 | 121 |
| 4 | 46 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 2 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 15265 | |
| 2 | 7267 | |
| 3 | 121 | 0.5% |
| 4 | 46 | 0.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 15265 | |
| 2 | 7267 | |
| 3 | 121 | 0.5% |
| 4 | 46 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 15265 | |
| 2 | 7267 | |
| 3 | 121 | 0.5% |
| 4 | 46 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 22699 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 15265 | |
| 2 | 7267 | |
| 3 | 121 | 0.5% |
| 4 | 46 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 22699 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 15265 | |
| 2 | 7267 | |
| 3 | 121 | 0.5% |
| 4 | 46 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 22699 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 15265 | |
| 2 | 7267 | |
| 3 | 121 | 0.5% |
| 4 | 46 | 0.2% |
fare_amount
Real number (ℝ)
High correlation Skewed
| Distinct | 185 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.026629 |
| Minimum | -120 |
|---|---|
| Maximum | 999.99 |
| Zeros | 6 |
| Zeros (%) | < 0.1% |
| Negative | 14 |
| Negative (%) | 0.1% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | -120 |
|---|---|
| 5-th percentile | 4.5 |
| Q1 | 6.5 |
| median | 9.5 |
| Q3 | 14.5 |
| 95-th percentile | 36 |
| Maximum | 999.99 |
| Range | 1119.99 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 13.243791 |
|---|---|
| Coefficient of variation (CV) | 1.0166706 |
| Kurtosis | 1420.1897 |
| Mean | 13.026629 |
| Median Absolute Deviation (MAD) | 3.5 |
| Skewness | 21.663101 |
| Sum | 295691.46 |
| Variance | 175.39799 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 1163 | 5.1% |
| 6.5 | 1089 | 4.8% |
| 5.5 | 1081 | 4.8% |
| 7 | 1067 | 4.7% |
| 7.5 | 1018 | 4.5% |
| 5 | 994 | 4.4% |
| 8.5 | 984 | 4.3% |
| 8 | 947 | 4.2% |
| 9 | 885 | 3.9% |
| 9.5 | 869 | 3.8% |
| Other values (175) | 12602 |
| Value | Count | Frequency (%) |
| -120 | 1 | < 0.1% |
| -4.5 | 2 | < 0.1% |
| -4 | 2 | < 0.1% |
| -3.5 | 3 | < 0.1% |
| -3 | 2 | < 0.1% |
| -2.5 | 4 | < 0.1% |
| 0 | 6 | < 0.1% |
| 0.01 | 2 | < 0.1% |
| 1 | 1 | < 0.1% |
| 2.5 | 104 |
| Value | Count | Frequency (%) |
| 999.99 | 1 | |
| 450 | 1 | |
| 200.01 | 1 | |
| 200 | 1 | |
| 175 | 1 | |
| 152 | 1 | |
| 150 | 1 | |
| 140 | 1 | |
| 131 | 1 | |
| 120 | 2 |
extra
Real number (ℝ)
High correlation Zeros
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.33327459 |
| Minimum | -1 |
|---|---|
| Maximum | 4.5 |
| Zeros | 11921 |
| Zeros (%) | 52.5% |
| Negative | 9 |
| Negative (%) | < 0.1% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.5 |
| 95-th percentile | 1 |
| Maximum | 4.5 |
| Range | 5.5 |
| Interquartile range (IQR) | 0.5 |
Descriptive statistics
| Standard deviation | 0.46309658 |
|---|---|
| Coefficient of variation (CV) | 1.3895346 |
| Kurtosis | 27.000218 |
| Mean | 0.33327459 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.5250157 |
| Sum | 7565 |
| Variance | 0.21445844 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 11921 | |
| 0.5 | 7104 | |
| 1 | 3564 | 15.7% |
| 4.5 | 101 | 0.4% |
| -0.5 | 7 | < 0.1% |
| -1 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| -1 | 2 | < 0.1% |
| -0.5 | 7 | < 0.1% |
| 0 | 11921 | |
| 0.5 | 7104 | |
| 1 | 3564 | 15.7% |
| 4.5 | 101 | 0.4% |
| Value | Count | Frequency (%) |
| 4.5 | 101 | 0.4% |
| 1 | 3564 | 15.7% |
| 0.5 | 7104 | |
| 0 | 11921 | |
| -0.5 | 7 | < 0.1% |
| -1 | 2 | < 0.1% |
mta_tax
Categorical
High correlation Imbalance
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 MiB |
| 0.5 | |
|---|---|
| 0.0 | 90 |
| -0.5 | 13 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0005727 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | 0.5 |
| 4th row | 0.5 |
| 5th row | 0.5 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 22596 | |
| 0.0 | 90 | 0.4% |
| -0.5 | 13 | 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.5 | 22609 | |
| 0.0 | 90 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 22789 | |
| . | 22699 | |
| 5 | 22609 | |
| - | 13 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 68110 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 22789 | |
| . | 22699 | |
| 5 | 22609 | |
| - | 13 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 68110 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 22789 | |
| . | 22699 | |
| 5 | 22609 | |
| - | 13 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 68110 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 22789 | |
| . | 22699 | |
| 5 | 22609 | |
| - | 13 | < 0.1% |
tip_amount
Real number (ℝ)
High correlation Zeros
| Distinct | 742 |
|---|---|
| Distinct (%) | 3.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8357813 |
| Minimum | 0 |
|---|---|
| Maximum | 200 |
| Zeros | 8057 |
| Zeros (%) | 35.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.35 |
| Q3 | 2.45 |
| 95-th percentile | 6.35 |
| Maximum | 200 |
| Range | 200 |
| Interquartile range (IQR) | 2.45 |
Descriptive statistics
| Standard deviation | 2.8006263 |
|---|---|
| Coefficient of variation (CV) | 1.5255773 |
| Kurtosis | 1124.3261 |
| Mean | 1.8357813 |
| Median Absolute Deviation (MAD) | 1.35 |
| Skewness | 18.188305 |
| Sum | 41670.4 |
| Variance | 7.8435075 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 8057 | |
| 1 | 1451 | 6.4% |
| 2 | 756 | 3.3% |
| 1.5 | 303 | 1.3% |
| 3 | 237 | 1.0% |
| 1.66 | 222 | 1.0% |
| 1.45 | 210 | 0.9% |
| 1.36 | 205 | 0.9% |
| 1.55 | 202 | 0.9% |
| 1.26 | 202 | 0.9% |
| Other values (732) | 10854 |
| Value | Count | Frequency (%) |
| 0 | 8057 | |
| 0.01 | 8 | < 0.1% |
| 0.02 | 4 | < 0.1% |
| 0.03 | 1 | < 0.1% |
| 0.04 | 1 | < 0.1% |
| 0.07 | 1 | < 0.1% |
| 0.08 | 1 | < 0.1% |
| 0.1 | 5 | < 0.1% |
| 0.12 | 1 | < 0.1% |
| 0.15 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 200 | 1 | |
| 55.5 | 1 | |
| 51.64 | 1 | |
| 46.69 | 1 | |
| 42.29 | 1 | |
| 28 | 1 | |
| 25.2 | 2 | |
| 25 | 1 | |
| 22.22 | 1 | |
| 21.3 | 1 |
tolls_amount
Real number (ℝ)
High correlation Zeros
| Distinct | 38 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.31254152 |
| Minimum | 0 |
|---|---|
| Maximum | 19.1 |
| Zeros | 21525 |
| Zeros (%) | 94.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 5.54 |
| Maximum | 19.1 |
| Range | 19.1 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.3992119 |
|---|---|
| Coefficient of variation (CV) | 4.4768833 |
| Kurtosis | 31.865134 |
| Mean | 0.31254152 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.0827272 |
| Sum | 7094.38 |
| Variance | 1.957794 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 21525 | |
| 5.76 | 847 | 3.7% |
| 5.54 | 239 | 1.1% |
| 10.5 | 21 | 0.1% |
| 12.5 | 11 | < 0.1% |
| 2.64 | 10 | < 0.1% |
| 2.54 | 6 | < 0.1% |
| 11.52 | 3 | < 0.1% |
| 16.26 | 3 | < 0.1% |
| 16.5 | 2 | < 0.1% |
| Other values (28) | 32 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 21525 | |
| 2.16 | 1 | < 0.1% |
| 2.54 | 6 | < 0.1% |
| 2.64 | 10 | < 0.1% |
| 2.7 | 1 | < 0.1% |
| 4.32 | 1 | < 0.1% |
| 5.16 | 1 | < 0.1% |
| 5.44 | 1 | < 0.1% |
| 5.45 | 1 | < 0.1% |
| 5.49 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 19.1 | 1 | < 0.1% |
| 18.28 | 1 | < 0.1% |
| 18.26 | 1 | < 0.1% |
| 18 | 2 | |
| 17.5 | 1 | < 0.1% |
| 17.28 | 1 | < 0.1% |
| 16.62 | 1 | < 0.1% |
| 16.5 | 2 | |
| 16.26 | 3 | |
| 16.2 | 1 | < 0.1% |
improvement_surcharge
Categorical
High correlation Imbalance
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 MiB |
| 0.3 | |
|---|---|
| -0.3 | 14 |
| 0.0 | 6 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0006168 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.3 |
|---|---|
| 2nd row | 0.3 |
| 3rd row | 0.3 |
| 4th row | 0.3 |
| 5th row | 0.3 |
Common Values
| Value | Count | Frequency (%) |
| 0.3 | 22679 | |
| -0.3 | 14 | 0.1% |
| 0.0 | 6 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.3 | 22693 | |
| 0.0 | 6 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 22705 | |
| . | 22699 | |
| 3 | 22693 | |
| - | 14 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 68111 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 22705 | |
| . | 22699 | |
| 3 | 22693 | |
| - | 14 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 68111 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 22705 | |
| . | 22699 | |
| 3 | 22693 | |
| - | 14 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 68111 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 22705 | |
| . | 22699 | |
| 3 | 22693 | |
| - | 14 | < 0.1% |
total_amount
Real number (ℝ)
High correlation Skewed
| Distinct | 1369 |
|---|---|
| Distinct (%) | 6.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.310502 |
| Minimum | -120.3 |
|---|---|
| Maximum | 1200.29 |
| Zeros | 4 |
| Zeros (%) | < 0.1% |
| Negative | 14 |
| Negative (%) | 0.1% |
| Memory size | 177.5 KiB |
Quantile statistics
| Minimum | -120.3 |
|---|---|
| 5-th percentile | 5.8 |
| Q1 | 8.75 |
| median | 11.8 |
| Q3 | 17.8 |
| 95-th percentile | 46.06 |
| Maximum | 1200.29 |
| Range | 1320.59 |
| Interquartile range (IQR) | 9.05 |
Descriptive statistics
| Standard deviation | 16.097295 |
|---|---|
| Coefficient of variation (CV) | 0.98692824 |
| Kurtosis | 1321.9239 |
| Mean | 16.310502 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 20.389403 |
| Sum | 370232.09 |
| Variance | 259.12292 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7.3 | 541 | 2.4% |
| 7.8 | 531 | 2.3% |
| 6.8 | 524 | 2.3% |
| 8.3 | 519 | 2.3% |
| 8.8 | 492 | 2.2% |
| 10.3 | 464 | 2.0% |
| 9.3 | 458 | 2.0% |
| 6.3 | 447 | 2.0% |
| 5.8 | 420 | 1.9% |
| 9.8 | 406 | 1.8% |
| Other values (1359) | 17897 |
| Value | Count | Frequency (%) |
| -120.3 | 1 | < 0.1% |
| -5.8 | 2 | |
| -5.3 | 2 | |
| -4.8 | 2 | |
| -4.3 | 3 | |
| -3.8 | 3 | |
| -3.3 | 1 | < 0.1% |
| 0 | 4 | |
| 0.3 | 1 | < 0.1% |
| 0.31 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1200.29 | 1 | |
| 450.3 | 1 | |
| 258.21 | 1 | |
| 233.74 | 1 | |
| 211.8 | 1 | |
| 179.06 | 1 | |
| 157.06 | 1 | |
| 152.3 | 1 | |
| 151.82 | 1 | |
| 150.3 | 1 |
Interactions
Correlations
| DOLocationID | PULocationID | RatecodeID | VendorID | extra | fare_amount | improvement_surcharge | mta_tax | passenger_count | payment_type | store_and_fwd_flag | tip_amount | tolls_amount | total_amount | trip_distance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DOLocationID | 1.000 | 0.100 | -0.026 | 0.038 | -0.023 | -0.091 | 0.022 | 0.099 | 0.001 | 0.024 | 0.017 | -0.023 | -0.035 | -0.088 | -0.101 |
| PULocationID | 0.100 | 1.000 | -0.050 | 0.022 | -0.015 | -0.087 | 0.022 | 0.016 | -0.009 | 0.027 | 0.000 | -0.024 | -0.075 | -0.085 | -0.089 |
| RatecodeID | -0.026 | -0.050 | 1.000 | 0.000 | -0.070 | 0.272 | 0.000 | 0.000 | 0.020 | 0.000 | 0.000 | 0.114 | 0.510 | 0.272 | 0.209 |
| VendorID | 0.038 | 0.022 | 0.000 | 1.000 | 0.013 | 0.000 | 0.025 | 0.020 | 0.278 | 0.082 | 0.073 | 0.000 | 0.005 | 0.006 | 0.012 |
| extra | -0.023 | -0.015 | -0.070 | 0.013 | 1.000 | -0.004 | 0.567 | 0.589 | 0.004 | 0.179 | 0.011 | 0.039 | -0.047 | 0.069 | 0.038 |
| fare_amount | -0.091 | -0.087 | 0.272 | 0.000 | -0.004 | 1.000 | 0.188 | 0.202 | 0.028 | 0.084 | 0.000 | 0.394 | 0.362 | 0.978 | 0.912 |
| improvement_surcharge | 0.022 | 0.022 | 0.000 | 0.025 | 0.567 | 0.188 | 1.000 | 0.698 | 0.000 | 0.228 | 0.000 | 0.000 | 0.000 | 0.015 | 0.000 |
| mta_tax | 0.099 | 0.016 | 0.000 | 0.020 | 0.589 | 0.202 | 0.698 | 1.000 | 0.014 | 0.218 | 0.000 | 0.108 | 0.461 | 0.171 | 0.124 |
| passenger_count | 0.001 | -0.009 | 0.020 | 0.278 | 0.004 | 0.028 | 0.000 | 0.014 | 1.000 | 0.023 | 0.020 | -0.020 | 0.014 | 0.024 | 0.039 |
| payment_type | 0.024 | 0.027 | 0.000 | 0.082 | 0.179 | 0.084 | 0.228 | 0.218 | 0.023 | 1.000 | 0.012 | 0.000 | 0.042 | 0.103 | 0.027 |
| store_and_fwd_flag | 0.017 | 0.000 | 0.000 | 0.073 | 0.011 | 0.000 | 0.000 | 0.000 | 0.020 | 0.012 | 1.000 | 0.000 | 0.000 | 0.000 | 0.016 |
| tip_amount | -0.023 | -0.024 | 0.114 | 0.000 | 0.039 | 0.394 | 0.000 | 0.108 | -0.020 | 0.000 | 0.000 | 1.000 | 0.211 | 0.532 | 0.375 |
| tolls_amount | -0.035 | -0.075 | 0.510 | 0.005 | -0.047 | 0.362 | 0.000 | 0.461 | 0.014 | 0.042 | 0.000 | 0.211 | 1.000 | 0.372 | 0.351 |
| total_amount | -0.088 | -0.085 | 0.272 | 0.006 | 0.069 | 0.978 | 0.015 | 0.171 | 0.024 | 0.103 | 0.000 | 0.532 | 0.372 | 1.000 | 0.897 |
| trip_distance | -0.101 | -0.089 | 0.209 | 0.012 | 0.038 | 0.912 | 0.000 | 0.124 | 0.039 | 0.027 | 0.016 | 0.375 | 0.351 | 0.897 | 1.000 |
Missing values
Sample
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 03/25/2017 8:55:43 AM | 03/25/2017 9:09:47 AM | 6 | 3.34 | 1 | N | 100 | 231 | 1 | 13.0 | 0.0 | 0.5 | 2.76 | 0.0 | 0.3 | 16.56 |
| 1 | 1 | 04/11/2017 2:53:28 PM | 04/11/2017 3:19:58 PM | 1 | 1.80 | 1 | N | 186 | 43 | 1 | 16.0 | 0.0 | 0.5 | 4.00 | 0.0 | 0.3 | 20.80 |
| 2 | 1 | 12/15/2017 7:26:56 AM | 12/15/2017 7:34:08 AM | 1 | 1.00 | 1 | N | 262 | 236 | 1 | 6.5 | 0.0 | 0.5 | 1.45 | 0.0 | 0.3 | 8.75 |
| 3 | 2 | 05/07/2017 1:17:59 PM | 05/07/2017 1:48:14 PM | 1 | 3.70 | 1 | N | 188 | 97 | 1 | 20.5 | 0.0 | 0.5 | 6.39 | 0.0 | 0.3 | 27.69 |
| 4 | 2 | 04/15/2017 11:32:20 PM | 04/15/2017 11:49:03 PM | 1 | 4.37 | 1 | N | 4 | 112 | 2 | 16.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 17.80 |
| 5 | 2 | 03/25/2017 8:34:11 PM | 03/25/2017 8:42:11 PM | 6 | 2.30 | 1 | N | 161 | 236 | 1 | 9.0 | 0.5 | 0.5 | 2.06 | 0.0 | 0.3 | 12.36 |
| 6 | 2 | 05/03/2017 7:04:09 PM | 05/03/2017 8:03:47 PM | 1 | 12.83 | 1 | N | 79 | 241 | 1 | 47.5 | 1.0 | 0.5 | 9.86 | 0.0 | 0.3 | 59.16 |
| 7 | 2 | 08/15/2017 5:41:06 PM | 08/15/2017 6:03:05 PM | 1 | 2.98 | 1 | N | 237 | 114 | 1 | 16.0 | 1.0 | 0.5 | 1.78 | 0.0 | 0.3 | 19.58 |
| 8 | 2 | 02/04/2017 4:17:07 PM | 02/04/2017 4:29:14 PM | 1 | 1.20 | 1 | N | 234 | 249 | 2 | 9.0 | 0.0 | 0.5 | 0.00 | 0.0 | 0.3 | 9.80 |
| 9 | 1 | 11/10/2017 3:20:29 PM | 11/10/2017 3:40:55 PM | 1 | 1.60 | 1 | N | 239 | 237 | 1 | 13.0 | 0.0 | 0.5 | 2.75 | 0.0 | 0.3 | 16.55 |
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22689 | 2 | 03/07/2017 12:25:52 PM | 03/07/2017 12:39:40 PM | 1 | 1.96 | 1 | N | 113 | 13 | 1 | 11.0 | 0.0 | 0.5 | 2.36 | 0.00 | 0.3 | 14.16 |
| 22690 | 2 | 09/21/2017 1:44:42 PM | 09/21/2017 1:52:06 PM | 1 | 0.89 | 1 | N | 43 | 142 | 1 | 7.0 | 0.0 | 0.5 | 1.95 | 0.00 | 0.3 | 9.75 |
| 22691 | 2 | 01/06/2017 1:50:14 AM | 01/06/2017 1:56:47 AM | 1 | 2.12 | 1 | N | 170 | 79 | 1 | 8.0 | 0.5 | 0.5 | 0.00 | 0.00 | 0.3 | 9.30 |
| 22692 | 1 | 07/16/2017 3:22:51 AM | 07/16/2017 3:40:52 AM | 1 | 5.70 | 1 | N | 249 | 17 | 1 | 19.0 | 0.5 | 0.5 | 4.05 | 0.00 | 0.3 | 24.35 |
| 22693 | 2 | 08/10/2017 10:20:04 PM | 08/10/2017 10:29:31 PM | 1 | 0.89 | 1 | N | 229 | 170 | 1 | 7.5 | 0.5 | 0.5 | 1.76 | 0.00 | 0.3 | 10.56 |
| 22694 | 2 | 02/24/2017 5:37:23 PM | 02/24/2017 5:40:39 PM | 3 | 0.61 | 1 | N | 48 | 186 | 2 | 4.0 | 1.0 | 0.5 | 0.00 | 0.00 | 0.3 | 5.80 |
| 22695 | 2 | 08/06/2017 4:43:59 PM | 08/06/2017 5:24:47 PM | 1 | 16.71 | 2 | N | 132 | 164 | 1 | 52.0 | 0.0 | 0.5 | 14.64 | 5.76 | 0.3 | 73.20 |
| 22696 | 2 | 09/04/2017 2:54:14 PM | 09/04/2017 2:58:22 PM | 1 | 0.42 | 1 | N | 107 | 234 | 2 | 4.5 | 0.0 | 0.5 | 0.00 | 0.00 | 0.3 | 5.30 |
| 22697 | 2 | 07/15/2017 12:56:30 PM | 07/15/2017 1:08:26 PM | 1 | 2.36 | 1 | N | 68 | 144 | 1 | 10.5 | 0.0 | 0.5 | 1.70 | 0.00 | 0.3 | 13.00 |
| 22698 | 1 | 03/02/2017 1:02:49 PM | 03/02/2017 1:16:09 PM | 1 | 2.10 | 1 | N | 239 | 236 | 1 | 11.0 | 0.0 | 0.5 | 2.35 | 0.00 | 0.3 | 14.15 |